Search CORE

1,418 research outputs found

Learnable Front Ends Based on Temporal Modulation for Music Tagging

Author: Ma Yinghao
Stern Richard M.
Publication venue
Publication date: 28/11/2022
Field of study

While end-to-end systems are becoming popular in auditory signal processing including automatic music tagging, models using raw audio as input needs a large amount of data and computational resources without domain knowledge. Inspired by the fact that temporal modulation is regarded as an essential component in auditory perception, we introduce the Temporal Modulation Neural Network (TMNN) that combines Mel-like data-driven front ends and temporal modulation filters with a simple ResNet back end. The structure includes a set of temporal modulation filters to capture long-term patterns in all frequency channels. Experimental results show that the proposed front ends surpass state-of-the-art (SOTA) methods on the MagnaTagATune dataset in automatic music tagging, and they are also helpful for keyword spotting on speech commands. Moreover, the model performance for each tag suggests that genre or instrument tags with complex rhythm and mood tags can especially be improved with temporal modulation.Comment: Submitted to ICASSP 202

arXiv.org e-Print Archive

Signal Processing

Author: Bourne J. B.
Stern Richard M., Jr.
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date: 15/07/1972
Field of study

Contains reports on two research projects.Joint Services Electronics Programs (U. S. Army, U. S. Navy, and U. S. Air Force) under Contract DAAB07-71-C-030

DSpace@MIT

Learning-based auditory encoding for robust speech recognition

Author: Bhiksha Raj
Richard M. Stern
Yu-hsiang Bosco Chiu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

This paper describes ways of speeding up the optimization process for learning physiologically-motivated components of a feature computation module directly from data. During training, word lattices generated by the speech decoder and conjugate gradient descent were included to train the parameters of logistic functions in a fashion that maximizes the a posteriori probability of the correct class in the training data. These functions represent the rate-level nonlinearities found in most mammalian auditory systems. Experiments conducted using the CMU SPHINX-III system on the DARPA Resource Management and Wall Street Journal tasks show that the use of discriminative training to estimate the shape of the rate-level nonlinearity provides better recognition accuracy in the presence of background noise than traditional procedures which do not employ learning. More importantly, the inclusion of conjugate gradient descent optimization and a word lattice to reduce the number of hypotheses considered greatly increases the training speed, which makes training with much more complicated models possible. Index Terms — automatic speech recognition, discriminative training, auditory models, data analysis 1

CiteSeerX

Crossref

Online Active Learning For Sound Event Detection

Author: Kubala Francis
Lindsey Mark
Shah Ankit
Stern Richard M.
Publication venue
Publication date: 25/09/2023
Field of study

Data collection and annotation is a laborious, time-consuming prerequisite for supervised machine learning tasks. Online Active Learning (OAL) is a paradigm that addresses this issue by simultaneously minimizing the amount of annotation required to train a classifier and adapting to changes in the data over the duration of the data collection process. Prior work has indicated that fluctuating class distributions and data drift are still common problems for OAL. This work presents new loss functions that address these challenges when OAL is applied to Sound Event Detection (SED). Experimental results from the SONYC dataset and two Voice-Type Discrimination (VTD) corpora indicate that OAL can reduce the time and effort required to train SED classifiers by a factor of 5 for SONYC, and that the new methods presented here successfully resolve issues present in existing OAL methods.Comment: Submitted to ICASSP 2024. Publication will belong to IEE

arXiv.org e-Print Archive

Gemini-South + FLAMINGOS Demonstration Science: Near-Infrared Spectroscopy of the z=5.77 Quasar SDSS J083643.85+005453.3

Author: Andrew J. Bunker
Daniel Stern
Iwata I.
Jon Willis
L. Felipe Barrientos
M. J. Ledlow
Patrick B. Hall
Prevot M. L.
Richard Elston
S. Nicholas Raines
Publication venue: 'University of Chicago Press'
Publication date: 01/01/2003
Field of study

We report an infrared 1-1.8 micron (J+H-bands), low-resolution (R=450) spectrogram of the highest-redshift radio-loud quasar currently known, SDSS J083643.85+005453.3, obtained during the spectroscopic commissioning run of the FLAMINGOS multi-object, near-infrared spectrograph at the 8m Gemini-South Observatory. These data show broad emission from both CIV 1549 and CIII] 1909, with strengths comparable to lower-redshift quasar composite spectra. The implication is that there is substantial enrichment of the quasar environment, even at times less than a billion years after the Big Bang. The redshift derived from these features is z = 5.774 +/- 0.003, more accurate and slightly lower than the z = 5.82 reported in the discovery paper based on the partially-absorbed Lyman-alpha emission line. The infrared continuum is significantly redder than lower-redshift quasar composites. Fitting the spectrum from 1.0 to 1.7 microns with a power law f(nu) ~ nu^(-alpha), the derived power law index is alpha = 1.55 compared to the average continuum spectral index = 0.44 derived from the first SDSS composite quasar. Assuming an SMC-like extinction curve, we infer a color excess of E(B-V) = 0.09 +/- 0.01 at the quasar redshift. Only approximately 6% of quasars in the optically-selected Sloan Digital Sky Survey show comparable levels of dust reddening.Comment: 10 pages, 1 figure; to appear in the Astrophysical Journal Letter

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Recommended from our members

Conceptual and Measurement Challenges in Research on Cognitive Reserve

Author: Glymour M. Maria
Jefferson Angela L.
Jones Richard
Manly Jennifer J.
Rentz Dorene M.
Stern Yaakov
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2011
Field of study

Cognitive reserve, broadly conceived, encompasses aspects of brain structure and function that optimize individual performance in the presence of injury or pathology. Reserve is defined as a feature of brain structure and/or function that modifies the relationship between injury or pathology and performance on neuropsychological tasks or clinical outcomes. Reserve is challenging to study for two reasons. The first is: reserve is a hypothetical construct, and direct measures of reserve are not available. Proxy variables and latent variable models are used to attempt to operationalize reserve. The second is: in vivo measures of neuronal pathology are not widely available. It is challenging to develop and test models involving a risk factor (injury or pathology), a moderator (reserve) and an outcome (performance or clinical status) when neither the risk factor nor the moderator are measured directly. We discuss approaches for quantifying reserve with latent variable models, with emphasis on their application in the analysis of data from observational studies. Increasingly latent variable models are used to generate composites of cognitive reserve based on multiple proxies. We review the theoretical and ontological status of latent variable modeling approaches to cognitive reserve, and suggest research strategies for advancing the field

Columbia University Academic Commons

PubMed Central